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Abstract. 

1 Introduction 

In February 2000, a major Web advertising firm known as DoubleClick touched 
off a furore in the press with the announcement of a more aggressive policy of con- 
sumer data aggregation. DoubleClick declared that it would begin to integrate 
offline information about consumers into its existing database of online informa- 
tion derived from surveillance of consumer Web surfing [?]. This announcement 
came in the midst of a number of articles in the popular press regarding surrep- 
titious sharing of consumer information. A week earlier, a report released by the 
California HealthCare Foundation alleged that a number of health-related Web 
sites were violating their own stated privacy policies and divulging sensitive infor- 
mation about customers to third parties (?]. The next day, Reuters reported that 
two suits for privacy invasion and an investigation by the Federal Trade Com- 
mission were pending against Amazon.com and its subsidiary Alexa [?]. While 
consumer and privacy advocacy groups vigorously decry such abuses, advertis- 
ers defend their policy of harvesting and exploiting demographic information 
by highlighting the benefits of targeted advertising. Consumers, they maintain, 
are more likely to find interest in advertising tailored to their own preferences, 
and such advertising consequently leads to greater consumer market efficiency. 
The United States government has addressed the issue by promoting a policy of 
industry self-regulation, leading to friction with the European Union, which has 
sought more stringent consumer privacy guarantees. 

In this paper, we show that targeted advertising and consumer privacy need 
not in fact be conflicting aims. We describe a simple, practical technical solution 
that enables use of detailed consumer profiles for the purposes of targeting ad- 
vertisements, but protects these profiles from disclosure to advertisers or hostile 
third parties. Somewhat surprisingly, the most basic embodiments of our idea 
do not even require use of cryptography. 

The underlying idea is quite simple. Rather than gathering information about 
a consumer in order to decide which advertisements to send her, an advertiser 
makes use of a client-side software module called a negotiant. The negotiant 
serves a dual purpose: It acts as a client-side proxy to protect user information, 
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and it also directs the targeting of advertisements. The negotiant requests ad- 
vertisements from the advertiser that are tailored to the profile provided by the 
user. The advertiser can control the palette of advertisements available to the 
negotiant, as well as the process by which it decides which ads to request. At 
the same time, the advertiser learns no information about the consumer pro- 
file beyond which advertisements the negotiant requested. In more sophisticated 
variants, the negotiant is able to participate in a protocol whereby the adver- 
tiser does not even learn what ads a given user has requested, but only sees 
ad requests in the aggregate. The end result is that the advertiser is able to 
target ads with a high degree of sophistication, and also to gather information 
on ad display rates, all without learning significant information about individual 
consumer profiles. 

Of course, some restriction must be placed on advertiser control of nego- 
tiants. Otherwise, the advertiser can manipulate them so as to extract detailed 
profile information from individual consumers. The fact that negotiants may be 
viewed and controlled by users helps offset this vulnerability, as we discuss in 
the body of the paper. An additional point of concern in our proposal is a re- 
striction that it places on advertisers. With the use of negotiants, advertisers 
cannot correlate profile information among users, as is possible when consumer 
profiles are collected in a central location. This drawback should be partially 
offset by the fact that the negotiant may safely and privately gain access to a 
great deal of sensitive information that would otherwise not be available to ad- 
vertisers. Nonetheless, we propose some strategies for addressing this limitation 
in a manner that preserves consumer privacy. 



1.1 Previous Work 

A negotiant may be viewed as a client-side software proxy. The related approach 
of using server proxies as a means of protecting consumer privacy is a well es- 
tablished one, and has even produced a number of commercial ventures. For 
a small subscription fee, companies such as Zero- Knowledge Systems [?] offer 
customers an encrypted channel to one or more proxy servers that anonymously 
reroute requests to destination servers. The proxy servers thus act as interme- 
diaries between the client and Web servers, shielding the client from positive 
identification. The client, of course, must trust at least one of the servers to en- 
sure her anonymity, and must also trust servers not to eavesdrop on or tamper 
with her communications, (Encryption or authentication, where available, may 
alleviate this latter problem.) Proxy services may be cryptographically strength- 
ened through the use of mix networks, A mix network is essentially a distributed 
cryptographic algorithm for interleaving multiple channels so as to anonymize 
them. We describe the idea in more detail in Section 2.1. For on-the-fly commu- 
nications, however, mix networks are often not practical, and therefore not yet 
employed in real-world applications. 

A variant on the idea of proxy servers is the Crowds project at AT&T Labs 
[?,?,?]. A "crowd" is a group of users, preferably with disparate geographical 



and other characteristics, that serve to shield one another's identities. The ser- 
vice requests of a user in a crowd are randomly rerouted through other crowd 
members, rendering the identity of the user indistinguishable from those of other 
crowd members. In this system, trust is embodied partly in an administrative 
server responsible for forming crowds, and partly in other crowd members. In 
particular, the user trusts other crowd members not to eavesdrop on or tamper 
with communications, and, to a lesser extent, not to perform traffic analysis. 

The proxy server and crowd approaches seek to provide a maximum of con- 
sumer privacy. While they can be combined with cookies, or other user tracking 
devices, they do not aim to accommodate more fine-grained control of Web server 
access to user data. The Platform for Privacy Preferences Project, known as P3P 
[?], focuses precisely on this latter problem of refined user control of personal 
demographic information. The goal of P3P is to enable Web sites to publish 
precise specifications of their privacy policies, and to enable users to exercise 
control over their private data in response to these policies. The platform allows 
users to define preferences over what data they are willing to divulge, and also 
policies about how to respond to site practices that are incompatible with their 
preferences. Under the aegis of the World Wide Web (W3) Consortium, P3P 
aims to set forth a standard syntax and body of protocols for general use on 
the Web. An initial version of the standard is close to completion, a "last call" 
specification having been released for public review in November 1999. 

Underlying the P3P approach is the presumption that mediation between 
consumers and advertisers is a matter of deciding what information consumers 
choose to reveal explicitly. As we explain above, we set forth a different approach 
in which consumers and advertisers to decide jointly in a privacy protecting man- 
ner what advertisements consumers should be provided with. There are a number 
of cryptographic tools that enable theoretically strong privacy protection in this 
approach. For the more strongly privacy protecting variants of our negotiant 
scheme, we consider variants on the idea of private information retrieval (PIR). 
A PIR scheme enables a client to request a piece of data from a server - such 
as an advertisement - in such a way that the server learns no information about 
the client request. 

More formally, let Bob be a user, and let Alice be a server that maintains a 
database containing items a = {ai, a2, . . . , a n }, Alice might represent an adver- 
tiser, and a might represent the collection of advertisements held by Alice. The 
aim of a PIR scheme is to enable Bob to retrive an element a r € a of his choice 
from Alice in such a way that Alice learns no information about r. Of course, 
this may be accomplished trivially by having Alice send all of a to Bob. As 
shown in [?], however, a single-server PIR scheme may in fact be designed with 
o(n) communication, in particular, n e communication for any e > 0 under the 
quadratic residuosity assumption. This was recently improved to poly log (n) com- 
munication overhead under the so-called phi-hiding assumption [?]. A number 
of variant PIR schemes have been proposed in the literature, such as symmetric 
PIR (SPIR) schemes, which include the additional property that the client sees 
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only the data it has requested [?], and a variant with auxiliary servers [?]. None 
of these proposed PIR schemes, however, is practical for wide scale deployment. 

In this paper, we consider a practical alternative to these proposed PIR 
schemes. In order to obtain improved communications and computational effi- 
ciency, we consider two relaxations of the common security model. First, in lieu of 
a single server (Alice), we assume a collection of servers among which a majority 
behave in an honest fashion. We refer to this as a threshold PIR scheme. As we 
show, a threshold PIR scheme is capable of achieving communication overhead 
of 0(1) per consumer request under appropriate computational assumptions. As 
a second, additional assumption, we consider a scenario in which requests from a 
large number of users may be batched, in which case it is acceptable for servers to 
learn what has been requested, but not by whom. In other words, in consonance 
with the Crowds principle, we permit full disclosure of aggregate information, 
but hide information regarding the requests of individual users. We refer to a 
threshold PIR scheme with this latter property as a semi-private PIR scheme. 
A semi-private PIR scheme, in addition to achieving communication overhead 
of 6>(1), is computationally quite efficient, involving 0(1) basic cryptographic 
operations per item per server. 

The negotiant approach we propose in this paper is not necessarily meant 
as a substitute for proxy servers, Crowds, or P3P. It may instead be viewed as 
a complementary technology, deployable in conjunction with any of these other 
ideas. For example, a user might use a proxy server and client-side negotiant 
function together, or might provide demographic information to a trusted server, 
allowing it to make use of a negotiant function. Moreover, any of a range of 
tradeoffs between efficiency and security may be used in the construction of a 
negotiant function. We show this by presenting in this paper not one, but four 
different negotiant schemes. 

1.2 Organization 

In Section 2, we describe the basic cryptographic primitives used in our more 
advanced negotiant protocols. We also formalize the model in which we propose 
our schemes, and set forth basic definitions regarding privacy. In Section 3, we 
propose some negotiant function constructions. We consider some practical im- 
plementation issues in Section 4, and conclude in Section 5 with a brief discussion 
of some future avenues of investigation. 



2 Preliminaries 
2.1 Building blocks 

Let us begin by introducing some of the cryptographic primitives used in the 
more advanced variants of our protocol. Readers familiar with the basic cryp- 
tographic literature may wish to skip to Section 2.2. Most of the protocols we 
describe are (j, m)- threshold protocols. These are protocols executed by a collec- 
tion of servers Si , S2, . ■ • , S m , where m > 1, such that protocol privacy and the 



correctness of the output are ensured given an honest coalition of any j servers. 
In such protocols, servers hold a private key x in an appropriate distributed 
fashion, with a corresponding published public key y = g x . It is common to use 
the Pedersen protocol [?,?] as a basis for distributed key generation, although 
see [?] for a caveat. We do not discuss key generation or any of the other or- 
dinary elements of distributed discrete log cryptographic algorithms in detail, 
but instead refer the reader to, e.g., [?] for a survey, or to any of the papers we 
reference with regard to specific algorithms. 

El Gamal crypto system: Where we require public key cryptography in our schemes, 
we employ the El Gamal cryptosystem [?,?]. Encryption in this scheme takes 
place over a group G q of prime order q. Typically, G q is taken to be a subgroup 
of Z*, where g\(p — 1). Alternatives are possible; for example, G q may be the 
group of points of an elliptic curve over a finite field. 1 

Let g be a generator of G q . This generator is typically regarded as a system 
parameter, since it may be used in multiple key pairs. The private encryption key 
consists of an integer x Gu Z qy where €u denotes uniform random selection. The 
corresponding public key is defined to be y = g x . To encrypt a message M 6 G q , 
the sender selects z eu Z q , and computes the ciphertext (a,/?) = {My z ,g z ). To 
decrypt this ciphertext using the private key x, the receiver computes a/p x = 
My z /{g z ) x = M. We assume a consistent choice of g as a generator for all 
instantiations of the El Gamal cryptosystem in this paper. 

The El Gamal cryptosystem is sernantically secure under the Decision Diffie- 
Hellman assumption over G q [?]. Informally, this means that an attacker who 
selects message pair (mo, mi) is unable to distinguish between encryptions of 
these two messages with probability non-negligibly greater than 1/2. See, e.g., 
[?] for futher details. 

Let (ao, /?o) ® (a\ } 0i) = (aoai,/?o/?i). Another useful property of the El 
Gamal cryptosystem is the fact that it possesses a homomorphism under the 
operator ®. In particular, observe that if (ao,A>) anc * (c*i,/?i) represent cipher- 
texts corresponding to plaintexts Mq and M\ respectively, then (ao, A))®(<*i , (3\) 
represents an encryption of the plaintext MqM\. A consequence of this homo- 
morphic property is that it is possible, using knowledge of the public key alone, 
to derive a random re-encryption (a',/?') of a given ciphertext (a,p). This is 
accomplished by computing {a\(3') — (a,/?) (8) (7,5), where (7,5) represents an 
encryption of the plaintext value 1. It is possible to prove quite efficiently in 
zero- knowledge that (a\0') represents a valid re-encryption of (a,/?) using, e.g., 
a variant of the Schnorr proof of knowledge protocol [?]. This proof may also be 
made non-interactive. See [?] for an overview. 

Quorum controlled asymmetric proxy re- encryption: This is a threshold algo- 
rithm enabling re-encryption of an El Gamal ciphertext under a new key. Input to 

1 Most commonly, we let p = 2q + 1, and we let G q be the set of quadratic residues 
in Zp. In this setting, plaintexts not in Q q can be mapped onto Q q by appropriate 
forcing of the LeGendre symbol, e.g., inversion of the associated integer sign. 
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the protocol is an El Gamal public key y', as well as a ciphertext (a, 0) = E y [M]. 
The output of the protocol is (a',/?') = E y >[M). Jakobsson proposes a protocol 
that is computationally secure in the sense that it is robust against any adver- 
sary controlling any minority coalition of cheating servers, and also preserves 
the privacy against such an adversary. Additionally, the protocol is efficient in a 
practical sense. See [?] for details. 

Distributed plaintext equality test This is a threshold protocol whereby, given El 
Gamal ciphertexts (a,/?) and (a',/?'), a collection of servers determine whether 
the underlying plaintexts are identical. The basic idea is that each server blinds 
the ciphertext by raising a and /? to a random exponent, and then proves the 
blinding correct. The resulting blinded ciphertext is then decrypted, yielding a 
1 if the underlying plaintexts are equivalent, and a random value otherwise. See 
[?] for an efficient and practical protocol construction and proofs of security. We 
write (a,/?) & (a',/3') to denote equality of underlying plaintexts. 

Bulletin Board: Our proposed schemes with multiple players or servers assume 
the availability of a bulletin board. This may be viewed as a piece of memory 
which any player may view and to which all players have appendative write 
access. A bulletin board may be realized as a public broadcast channel, or is 
achievable through Byzantine agreement or some appropriate physical assump- 
tion. See [?,?] for description of a practical implementation of this primitive. 
Postings to a bulletin board may be made authenticate, i.e., their source may 
be securely validated, through use of such mechanisms as digital signatures. 
In many cases, our proposed algorithms only require bulletin board access by 
servers, not by other players. 

Mix networks: A critical building block in our protocols is a threshold algorithm 
known as a mix network. Let E y [M] represent the encryption under public key 
y of message M in a probabilistic public- key cryptosystem, typically El Gamal. 
This notation is informal, in the sense that it does not take into account the 
random encryption exponent that causes two encryptions of the same plaintext 
to appear different from one another. While we retain this notation for simplicity, 
the reader must bear it in mind, particularly with regard to the fact that mix 
networks involve re-encryption of ciphertexts. 

A mix network takes as input a vector of ciphertexts V = {E y \M\\, E y \M^ . . . , E y [M n }}. 
Output from the mix network is the vector V f = {E y [M a ^}, E y [M a ( 2 )], • • ■ , ^y\^a{n))}^ 
where a is a random permutation on n elements. A mix scheme is said to be 
robust if, given a static adversary with active control of a minority coalition of 
servers, V 1 represents a valid permutation and re-encryption of ciphertexts in 
V with overwhelming probability. A mix scheme is said to be private if, given 
valid output V, for any i 6 {1,2, . . . ,n}, an adversary with active control of a 
minority coalition and passive control of at most m — 1 servers cannot determine 
cr -1 (i) with probability non-negligibly larger than 1/n. It should be noted that 
to prevent attacks in which some players post re-encryptions of other players' 
inputs, it is often a requirement that input be encrypted in a manner that is 



plaintext aware. For this, it suffices that a player presenting El Gamal cipher- 
text (a, 0) also provide a zero knowledge proof of knowledge of log 5 0, and that 
servers check the correctness of this proof. See [?] for further details. 

Mix servers were introduced by Chaum [?] as a basic primative for privacy. 
In his simple formulation , each server Si takes the output Vi of the previous 
server and simply permutes and re-encrypts the ciphertexts therein. While this 
scheme is private, it is not robust. Robust, threshold mix networks were in- 
troduced in [?,?]. The most efficient mix network to date is the construction 
of Jakobsson, which requires 0{n) computation and communication per server. 
The scheme makes use of the El Gamal cryptosystem . For further details and 
formal definitions, the reader is referred to [?]. Given its robustness and efficiency 
on large batches, the Jakobsson construction is probably most appropriate for 
our schemes. It will be observed, however, that robustness is not of critical im- 
portance in these schemes, as a server corrupting the computation can at best 
insert a false or incorrectly advertisement, something likely to be detected if 
widespread. Hence, even the mix network proposed by Chaum may often be 
appropriate for our proposed schemes. 

There are many variations on mix networks. For example, there are efficient 
mix networks in which V is a vector of /c-tuples of ciphertexts. Additionally, a 
mix network may take either ciphertexts or plaintexts as inputs and likewise 
output either plaintexts or ciphertexts. We employ a variety of such operations 
in our protocols, and do not describe implementation details. 

2.2 Model and definitions for our scheme 

Let C\ y C2, . . . , Cfc be a collection of consumers toward whom advertisements are 
to be directed. Let Pi, P2, . . . , Pk be the respective profiles of these consumers. 
These profiles may contain any of a variety of pieces of information on the con- 
sumer, including standard demographic information such as age, sex, profession, 
annual income, etc., as well as other information such as recently visited URLs 
and search engine queries. Let us designate the set of possible consumer profiles 
by V. We denote the advertiser by A, and let a = {ai^, • • -,a n } be the set 
of advertisements that A seeks to distribute. The advertiser chooses a negotiant 
function f a : V — + Z n . This function takes the profile of a consumer as input and 
outputs a choice of advertisement from among a. (We shall write / for clarity of 
notation, leaving the subscript implicit.) As an example, / might derive a list of 
the most common words in URLs visited by the user and seek to match these to 
descriptors associated with the ads in a. Of course, it is possible to include any 
of a wide variety of additional inputs to /, such as the current date, or the list of 
advertisements already sent to the consumer. We assume that A is represented 
by a set of servers Si, 52, . . . , S m , for m > 1; these servers share a bulletin board. 
All consumers post ad requests to the bulletin board. Servers then initiate com- 
munication with consumers and dispense ads to them. The following is a list of 
definitions and properties useful in describing negotiant protocols. 

Let I be an appropriately defined security parameter. We say that a function 
q(l) is negligible if for any polynomial p, there exists a value d such that for / > d } 



we have q(l) < Otherwise, we say that q is non-negligible. We say that 

probability q(l) is overwhelming if 1 - q(l) is negligible. 

Let A\ be a polynomial-time adversary that actively controls a static minority 
set of servers, or, if there is only one server, controls that single server. In other 
words, let us suppose that A\ controls max([rn/2\, 1) servers. In addition, let us 
suppose that A\ knows / and a. Consider the following experiment. We assume 
without loss of generality that A\ does not control consumer C\. A\ chooses 
a pair of profiles Po, P\ 6 V. A bit b € {0,1} is selected at random and Pi 
is set to Pb. Now the protocol is executed, and A\ outputs a guess for bi We 
say that the protocol has full privacy if for any adversary A\, it is the case 
that pr[A\ outputs b] - 1/2 is negligible, where the probability is taken over the 
coin flips of all participants. This definition states informally that the protocol 
transcript reveals no significant information about Pi, even if all other consumers 
are in the control of A\. 

Now let us modify the experiment slightly and consider a polynomial-time 
algorithm A2 that is provided only with the input /(Pi), /(P2), • • • , /(Pfc) in a 
random order, as well as / and a. We say that a negotiant protocol has pro- 
file privacy if for any A\ and any P 2 , P3, . . . , Pfc, there exists an A2 such that 
pr[A\ outputs 6] — pr[A 2 outputs b] is negligible if all consumers execute the pro- 
tocol correctly. Again, probabilities are taken over the coin flips of all partici- 
pants. In other words, the protocol transcript reveals no signficant information 
about Pi other than that revealed by the aggregate ad requests of the partici- 
pating consumers. Note that when m — 1, the property of profile privacy means 
that an advertiser learns only the ad requests of a given consumer. When m > 1, 
the property implies that an advertiser learns only the aggregate ad requests of 
a group of consumers. 

We say that a negotiant protocol is aggregate transparent if any server can 
determine /(Pi ), /(P2), . . . , f(Pk) with overwhelming probability. In real-world 
advertising scenarios, it is important that a protocol be aggregate transparent, 
as the clients of advertisers typically wish to know how many times their ads 
have been displayed. 

3 Some Negotiant Schemes 

We now present several negotiant schemes representing a small spectrum of 
tradeoffs between security properties and resource costs. 

3.1 Scheme 1: Naive PIR scheme 

We present this simple scheme as a conceptual introduction. Here, requests are 
directed from a single consumer C with profile P to a single server S. (Thus the 
scheme may be modeled by m — k — 1.) The scheme is this: The server sends 
all of a to C\ who then views ad clj(p)- 

Clearly, this scheme enjoys full privacy. The chief drawback is the 0(n) com- 
munication cost. Another drawback is the fact that the scheme is not aggregate 



transparent. Nonetheless, given a limited base of advertisements and good band- 
width, and if advertisers are satisfied with recording click-through rates, this 
scheme may be useable in certain practical scenarios. 

3.2 Scheme 2: Direct request scheme 

This is another conceptually simple scheme involving a one-on-one consumer and 
server interaction. In this scheme, C simply sends f(P) to 5, who returns 
This scheme enjoys profile privacy and has communication and computation 
costs 6>(1). Despite (or because of) its simplicity, it is from a practical standpoint 
in fact quite appealing. 

3.3 Scheme 3: Semi-private PIR scheme 

We now show how to invoke some of the cryptographic apparatus described above 
in order to achieve a semi-private PIR scheme useable as the basis for a negotiant 
scheme. Given database a = {a\, a<i, . . . , a n }, the goal is for a collection of 
consumers Ci, C2, . . . , C* to retrieve respective elements a ri , a r2 , . . . , a rfc in such 
a way that the database servers learn requests only in the aggregate. Of course, 
our aim here is to apply this scheme to the retrieval of advertisements, and we 
shall present it in this context. In other words, we assume that 7^ = /(Pi), i.e., 
users are consumers seeking to retrieve advertisements. As above, we assume 
a public/private El Gamal key pair (y, x) held in an appropriate distributed 
manner by servers Si, 52, . . . , S m . We also assume that each consumer Ci has a 
public/private El Gamal key pair (yci^cj- The scheme is as follows. 

1. Each consumer d computes r; = f(Pi) and posts the pair (Z? y [r^],i) to the 
bulletin board. Let V\ = {E y [ri\, i}f =1 be a vector of ciphertext/plaintext 
pairs accumulated when all consumers have posted their requests. 

2. Servers apply a mix network to V\ to obtain V2, where V2 is a vector of pairs 
{( r ai(i)> ^y[°"i W])}i=i f° r random, secret permutation a\. 

3. Servers replace each integer Tj in V 2 with a Ty Call the resulting vector V 2 '. 

4. Servers apply a mix network to to obtain a vector V3, where V3 is a vector 
of pairs {{E y [a r ^ (i) ] y ^(i))}^ , and o<z is an random, secret permutation. 

5. Let (E y [a ri },i) be an element in V3, For each pair, the servers apply quorum 
controlled asymmetric proxy re-encryption to obtain (E yc . [a r J, i). Let the 
resulting vector be V4. 

6. For each element {E yc . [a r J, i) in V4, servers send E yc \a ri ) to Ci. 

7. Consumers decrypt their respective ciphertexts. 

The security of the scheme is predicated on that of the underlying mix network. 
If we use that proposed in [?], for example, it may be shown that this is a 
semi-private PIR scheme, i.e., enjoys profile privacy, relative to the Decision 
Diffie-Hellman assumption. Assuming that a public key operation in G q incurs 
cost 6>(Z 3 ), where the security parameter I is linearly related to the bit length 
|g|, the computational costs of the scheme are <9(/ 3 ) per element per server. The 
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communication costs of the scheme are 0(1). With appropriate implementation 
enhancements, some of which we discuss in Section 4, we believe that this scheme 
may be deployed in a practical manner. 

3.4 Scheme 4: Threshold PIR 

The semi-private PIR scheme described above can be converted into a threshold 
PIR scheme with a few extra steps, and at the expense of additional computa- 
tional overhead. The idea is to perform a blind lookup of consumer ad requests. 
This is accomplished by mixing ads and then invoking the distributed plaintext 
equality test described in Section 2.1. The construction is such that processing 
consumer requests one at a time is as efficient as processing many simultane- 
ously. We therefore present the protocol as applied to a single consumer C with 
profile P and private/public key pair (yc, %c)- Consumer C computes r = f(P) 
and posts E y [r] to the bulletin board. The protocol is then as follows. 

1. Servers construct a vector U\ of pairs {(j, ^v[ a j])}?=i- 



2. Servers mix U\ to obtain a vector U2 of the form {Ey[a{j)\^ E y [a a (j)]) for a 
random, secret permutation a. 

3. For each j, the servers perform a distributed plaintext equality test to see 
whether E y [j) ^ E y [r\. Assuming correct protocol execution, when a match 
is found, this yields the ciphertext pair [E y [r], E y [a r }). 

4. The servers apply quorum controlled asymmetric proxy re-encryption to ob- 
tain E yc [a r ]. They send this to C. 

5. C decrypts E yc [a r ] to obtain a r . 

This protocol has communication complexity 0(1). The computational complex- 
ity is 0(nl 3 ) per server. 

4 Security and Implementation Issues 
4.1 Attacks outside the model 

We have offered cryptographically based characterizations of the security of our 
schemes, showing their adherence to the appropriate definitions in 2.2. In par- 
ticular, we see that for Schemes 2 and 3 that an attacker in control of a minority 
coalition of servers can learn little beyond individual or aggregate ad requests. 
As mentioned above, however, even with these security guarantees an advertiser 
with full control of the negotiant function / can manipulate it so as to extract 
detailed profile information from individual users. Let us suppose, for example, 
that an advertiser wishes, through Scheme 2, to learn the approximate annual 
household income in dollars of a given consumer C with profile P. The advertiser 
can, for example, construct a function / such that f(P) = [7/10, OOOJ, where 
I is the annual household income of the consumer. In fact, given enough lati- 
tutude in the distribution of the negotiant function to consumers, an advertiser 
can even defeat the aggregate security of Scheme 3, by distributing a different 
function to each consumer. We propose a number of possible safeguards against 
such abuses. 



— Open source negotiant function: The idea here is to allow easy reverse 
engineering of / by consumers or watchdog organizations. This may be ac- 
complished by requiring that / be encoded in a high level language, and 
even providing software tools for dissecting it. Consumers or organizations 
that deem / unduly invasive may refuse to receive advertisements or lodge 
complaints against the advertiser, as appropriate, 

— If / is unduly obscured or intrusive, consumers may complain against or 
boycott the advertiser. To ensure that / is consistent from user to user, a 
signed and timestamped hash should be publicly posted, or else / should be 
distributed by some trusted site 

4.2 Mixing need not be on-the-fly 

User can request ad r, and have it returned at some later time. 

4.3 Implementation efficiency enhancements 

Bulk encryption We assume in Schemes 3 and 4 that an advertisement may 
be represented as a single ciphertext. Of course, in reality, it is impractical to 
use ads small enough or a group G q large enough to support this assumption. 
We may, however, represent an advertisement as a sequence of associated ci- 
phertexts. Alternatively, we may use an enveloping scheme, and represent a 
encryption of M as E y [M] = (7, 5), where 7 = {E v [ki], £y[«2]> . . . , E v [k z ]} and 
5 = € Kz e Kz _ x t\[M]. Here, e K [M] represents a symmetric- key encryption of M, 
where k e K is a key from keyspace K. To re-encrypt E y \M] as (7', 5'), a player 
does the following: 

1. Re-encrypt all ciphertexts in 7. 

2. Select k z +\ €u K. 

3. Append E v [k z +i] to 7 to obtain 7'. 

4. Compute 5 f as e K , + ,[<5]. 

There are two major drawbacks to this scheme. First, the size of a ciphertext, 
as well as the computational cost of re-encryption, grows linearly in the number 
of re-encryptions z. While this leads to poor asymptotic performance, it may 
substantially enhance the performance in practice, particularly when m is small 
and ad sizes are large. A second drawback is that it is unclear how to achieve 
robustness in an efficient way in a mix network employing such a scheme. We 
submit, however, that robustness is a less important consideration than privacy 
in our schemes. 

Reducing public-key operation costs It should be noted that all of the costly 
operations in our scheme involve exponentiations in G q . These may be made 
quite inexpensive through the use of pre-processing or addition chains. Thus, 
assuming 100,000 elements, the total cost is roughly that of... 



Dishonest behavior among consumers Adversary control of consumers 

4.4 Construction of / 

4.5 Correlated consumer data 

Advertisers are often interested in obtaining aggregate demongraphic informa- 
tion, or in determining correlations among various data items. We propose two 
possible means of accomplishing this. First, it is possible to allow consumers to 
sell their demographic information in exchange for money or services. (This is 
effectively the contract behind free ISPs such as NetZero.) This of course will 
present a somewhat biased picture. Another possible means is to invoke a mix 
network 

5 Conclusion 

This paper seeks to convey two ideas, the first cryptographic and the second 
political. On the cryptographic side, we observe that by relaxing certain of the 
security assumptions surrounding the conventional PIR model, we are able to 
achieve considerable practical improvements in terms of both communication and 
computational complexity. While the privacy guarantees offered by the schemes 
we propose are somewhat weaker than theory allows, they are often quite ad- 
equate for real-world applications. On the political side, we have offer a new 
perspective on the contention between online advertisers and consumer privacy 
advocates. We demonstrate a conceptually simple technical approach to adver- 
tising that brings the objectives of both camps into closer alignment. 

Of the farrago of issues and problems we leave unaddressed, we conclude by 
mentioning some here as topics for future research. Perhaps the most pressing, 
and also the most complex, is how to harmonize the numerous approaches to 
privacy and consumer convenience in the literature and in practice. For 

Our proposed threshold PIR scheme, i.e., Scheme 4, raises an interesting 
cryptographic question. In that scheme, it is as efficient to process consumer 
requests one at a time as it is to process them in a batch. Is there some form 
of batch processing that can improve the efficiency of this scheme without a 
significant weakening of the underlying security model? 

One important issue is that of server cookies. In many cases, consumers are 
quite willing to accept cookies as a means of customizing or simplifying their 
experience of a Web site. For example, customers visiting the Amazon.com site 
may wish to use cookies in order to facilitate the provision of book recommen- 
dations and to make the purchase process easier. In this case, as users often 
provide credit card and address information, their use of cookies is essentially 
a means of self-identification, although it does not leak extensive demongraphic 
information directly. Regulation of information gathered through distribution of 
these cookies is an important legislative question. 
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1 Introduction 

In February 2000, a major Web advertising firm known as DoubleClick touched 
off a furore in the press with the announcement that it would integrate offline 
information about consumers into its existing database of online information 
derived from surveillance of consumer Web surfing. This announcement by Dou- 
bleClick came on the heels of a number of other articles in the popular press re- 
garding abuse of private consumer information. A week earlier, a report released 
by the California HealthCare Foundation alleged that a number of health-related 
Web sites were violating their own stated privacy policies and divulging sensitive 
information about customers to third parties. The next day, Reuters reported 
that two suits for privacy invasion and an investigation by the Federal Trade 
Commission were pending against Amazon.com and its subsidiary Alexa. While 
consumer and privacy advocacy groups vigorously decry such abuses, advertis- 
ers defend their policy of harvesting and exploiting demographic information 
by highlighting the benefits of targeted advertising. Consumers, they maintain, 
are more likely to find interest in advertising tailored to their own preferences, 
and such advertising consequently leads to greater consumer market efficiency. 
The United States government has addresed the issue by promoting a policy of 
industry self-regulation, leading to friction with the European Union, which has 
sought more stringent consumer privacy guarantees. 

In this paper, we show that targeted advertising and consumer privacy need 
not in fact be conflicting aims. We describe a simple, practical technical solution 
that enables sophisticated use of detailed consumer profiles for the purposes of 
targeting advertisements, but protects these profiles from disclosure to advertis- 
ers or hostile third parties. Somewhat surprisingly, the most basic embodiments 
of our idea do not even require use of cryptographic techniques. 

The underlying idea is quite simple. Rather than gathering information about 
a consumer in order to decide which advertisements to send her, an advertiser 
makes use of a client-side software module called a negociant. The negociant 
serves a dual purpose: It acts as a client-side proxy to protect user information, 
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and also directs the targeting of advertisements. The negociant requests adver- 
tisements from the advertiser that are tailored to the profile provided by the 
user. The advertiser can control the palette of advertisements available to the 
negociant, as well as the process by which it decides which ads to request. At 
the same time, the advertiser learns no information about the consumer pro- 
file beyond which advertisements the negociant requested. In more sophisticated 
variants, the negociant is able to participate in a protocol whereby the adver- 
tiser does not even learn what ads a given user has requested, but only sees 
ad requests in the aggregate. The end result is that the advertiser is able to 
target ads with a high degree of sophistication, and also to gather information 
on ad display rates, all without learning significant information about individual 
consumer profiles. 

Of course, some restriction must be placed on advertiser control of nego- 
ciants. Otherwise, the advertiser can manipulate them so as to extract detailed 
profile information from indivudal consumers. The fact that negociants may be 
viewed and controlled by users helps offset this vulnerability, as we discuss in 
the body of the paper. An additional point of concern in our proposal is a re- 
striction that it places on advertisers. With the use of negociants, advertisers 
cannot correlate profile information among users, as is possible when consumer 
profiles are collected in a central location. This drawback should be partially 
offset by the fact that the negociant may safely and privately gain access to a 
great deal of sensitive information that would otherwise not be available to ad- 
vertisers. Nonetheless, we propose some strategies for addressing this limitation 
in a manner that preserves consumer privacy. 

1.1 Previous Work 

A negociant may be viewed as a client-side software proxy. The approach of using 
proxies as a means of protecting consumer privacy is a well established one, and 
has seen application by both research and commercial ventures. These efforts, 
however, have typically involved proxy servers, i.e., servers that act as interme- 
diaries between consumers and Web sites. (Of course, it is possible to install a 
negociant on a proxy server, rather than the client. This requires, however, an 
embodiment of trust in the proxy server.) A number of proposals... 
Consumer privacy in P3P - language of privacy 

In the stronger variant of our scheme, we wish to ensure against advertisers 
learning which advertisements have been requested by which users. In principle, 
it is possible for a consumer to request an advertisement from a server such that 
the server learns no information about the request. Cryptographic schemes to 
accomplish this aim are known as private information retrieval (PIR) schemes. 
More formally, let Bob be a user, and let Alice be a server that maintains a 
database containing items a — {a\ , a^, • . . , a n }\ for instance, Alice may represent 
an advertiser, and a may represent the collection of advertisements held by 
Alice. The aim of a PIR scheme is to enable Bob to retrive an element a r € a 
of his choice from Alice in such a way that Alice learns no information about 
r. Of course, this may be accomplished trivially by having Alice send all of A 



adprivacy14FebOO.pdf 



to Bob. As shown in [?], however, a PIR scheme may in fact be designed with 
o(n) communication, in particular, n e communication for any e > 0 under the 
quadratic residuosity assumption. A number of improvements and variants have 
subsequently been proposed in the literature. Most recent of these is a scheme 
with polylog(n) communication overhead [?]. None of the proposed PIR schemes, 
however, is practical for wide scale deployment. 

In this paper, we consider a practical alternative to these proposed PIR 
schemes. In order to obtain improved communications and computational effi- 
ciency, we relax the common security model in two respects. First, in lieu of a 
single server (Alice), we assume a collection of servers among which a majority 
behave in an honest fashion. In other words, we make use of a threshold scheme. 
Second, we assume that requests from a large number of users may be batched, 
in which case it is acceptable for servers to learn what has been requested, but 
not by whom. In other words, in consonance with the "Crowds" principle, we 
permit full disclosure of aggregate information, but hide information regarding 
the requests of individual users. We refer to a scheme satisfying the PIR criteria 
with these two weakened assumptions as a semi-private information retrieval 
(SPIR) scheme. As we show, a SPIR scheme is capable of achieving communi- 
cation overhead of 0{l)- 

2 Definitions 

Let Ci, C2, . . . , Ck be a collection of consumers toward whom advertisements are 
to be directed. Let Pi, P 2 , , . . , Pk be the respective profiles of these consumers. 
These profiles may contain any of a variety of pieces of information on the con- 
sumer, including standard demographic information such as age, sex, profession, 
annual income, etc., as well as other information such as recently visited URLs 
and search engine queries. Let us designate the set of possible consumer profiles 
by V. We denote the advertiser by A, and let a = {ai, a2, . . . ,a n } be the set 
of advertisements that A seeks to disseminate. The advertiser chooses a nego- 
tiant function f a : V — > Z n . This function takes the profile of a consumer as 
input and outputs a choice of advertisement from among a. As an example, / 
might derive a list of the most common words in URLs visited by the user and 
seek to match these to descriptors associated with the ads in a. Of course, it is 
possible to include additional inputs to /, such as the current date, or the list 
of advertisements already sent to the consumer. We leave these choices to the 
imagination of the reader. 

In a basic negociant protocol, a consumer d downloads / from the advertiser 
A. He obtains f(P{) from A through an interactive protocol. In a group negociant 
protocol, we assume that A is represented by a collection of servers Si , £2, • • • , S m 
who share a bulletin board. This is a piece of shared memory with authenticated 
appenditive write access, and is accessible by any server or consumer. Consumers 
post ad requests to the bulletin board in some form until a triggering event 
occurs. Servers then initiate communication with consumers and dispense ads 
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to them. The following is a list of definitions and properties useful in describing 
negociant protocols. 

Let / be a security parameter for a negociant protocol. We say that a function 
h(l) is negligible if for any polynomial p, there exists a value d such that for / > d, 
we have h(l) < Otherwise, we say that h is non-negligible. 

We say that a negociant protocol has profile privacy if the following holds 
for every consumer d. Let A\ be an polynomial-time algorithm that takes as 
input the transcript of the negociant protocol and has access to all information 
possessed by A. Let A2 be a polynomial-time algorithm that has access to all 
information possessed by A and also to f(Pi). Let b be a 
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1 Introduction 

In February 2000, a major Web advertising firm known as DoubleClick touched 
off a furore in the press with the announcement that it would integrate offline 
information about consumers into its existing database of online information 
derived from surveillance of consumer Web surfing. This announcement by Dou- 
bleClick came on the heels of a number of other articles in the popular press re- 
garding abuse of private consumer information. A week earlier, a report released 
by the California HealthCare Foundation alleged that a number of health-related 
Web sites were violating their own stated privacy policies and divulging sensitive 
information about customers to third parties. The next day, Reuters reported 
that two suits for privacy invasion and an investigation by the Federal Trade 
Commission were pending against Amazon.com and its subsidiary Alexa. While 
consumer and privacy advocacy groups vigorously decry such abuses, advertis- 
ers defend their policy of harvesting and exploiting demographic information 
by highlighting the benefits of targeted advertising. Consumers, they maintain, 
are more likely to find interest in advertising tailored to their own preferences, 
and such advertising consequently leads to greater consumer market efficiency. 
The United States government has addresed the issue by promoting a policy of 
industry self-regulation, leading to friction with the European Union, which has 
sought more stringent consumer privacy guarantees. 

In this paper, we show that targeted advertising and consumer privacy need 
not in fact be conflicting aims. We describe a simple, practical technical solution 
that enables sophisticated use of detailed consumer profiles for the purposes of 
targeting advertisements, but protects these profiles from disclosure to advertis- 
ers or hostile third parties. Somewhat surprisingly, the most basic embodiments 
of our idea do not even require use of cryptographic techniques. 

The underlying idea is quite simple. Rather than gathering information about 
a consumer in order to decide which advertisements to send her, an advertiser 
makes use of a client-side software module called a negotiant. The negotiant 
serves a dual purpose: It acts as a client-side proxy to protect user information, 
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and also directs the targeting of advertisements. The negociant requests adver- 
tisements from the advertiser that are tailored to the profile provided by the 
user. The advertiser can control the palette of advertisements available to the 
negociant, as well as the process by which it decides which ads to request. At 
the same time, the advertiser learns no information about the consumer pro- 
file beyond which advertisements the negociant requested. In more sophisticated 
variants, the negociant is able to participate in a protocol whereby the adver- 
tiser does not even learn what ads a given user has requested, but only sees 
ad requests in the aggregate. The end result is that the advertiser is able to 
target ads with a high degree of sophistication, and also to gather information 
on ad display rates, all without learning significant information about individual 
consumer profiles. 

Of course, some restriction must be placed on advertiser control of nego- 
ciants. Otherwise, the advertiser can manipulate them so as to extract detailed 
profile information from individual consumers. The fact that negociants may 
be viewed and controlled by users helps offset this vulnerability, as we discuss 
in the body of the paper. An additional point of concern in our proposal is a 
restriction that it places on advertisers. With the use of negociants, advertisers 
cannot correlate profile information among users, as is possible when consumer 
profiles are collected in a central location. This drawback should be partially 
offset by the fact that the negociant may safely and privately gain access to a 
great deal of sensitive information that would otherwise not be available to ad- 
vertisers. Nonetheless, we propose some strategies for addressing this limitation 
in a manner that preserves consumer privacy. 

1.1 Previous Work 

A negociant may be viewed as a client-side software proxy. The approach of using 
proxies as a means of protecting consumer privacy is a well established one, and 
has seen application by both research and commercial ventures. These efforts, 
however, have typically involved proxy servers, i.e., servers that act as interme- 
diaries between consumers and Web sites. (Of course, it is possible to install a 
negociant on a proxy server, rather than the client. This requires, however, an 
embodiment of trust in the proxy server.) A number of proposals... 
Consumer privacy in P3P - language of privacy 

In the stronger variant of our scheme, we wish to ensure against advertisers 
learning which advertisements have been requested by which users. In principle, 
it is possible for a consumer to request an advertisement from a server such that 
the server learns no information about the request. Cryptographic schemes to 
accomplish this aim are known as private information retrieval (PIR) schemes. 
More formally, let Bob be a user, and let Alice be a server that maintains a 
database containing items a = {ai, a<z, . . . , a n }; for instance, Alice may represent 
an advertiser, and a may represent the collection of advertisements held by 
Alice. The aim of a PIR scheme is to enable Bob to retrive an element a r € a 
of his choice from Alice in such a way that Alice learns no information about 
r. Of course, this may be accomplished trivially by having Alice send all of A 
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to Bob. As shown in [?], however, a PIR scheme may in fact be designed with 
o(n) communication, in particular, n e communication for any e > 0 under the 
quadratic residuosity assumption. A number of improvements and variants have 
subsequently been proposed in the literature. Most recent of these is a scheme 
with poiyiog(n) communication overhead [?]. None of the proposed PIR schemes, 
however, is practical for wide scale deployment. 

In this paper, we consider a practical alternative to these proposed PIR 
schemes. In order to obtain improved communications and computational effi- 
ciency, we consider two relaxations of the common security model. First, in lieu of 
a single server (Alice), we assume a collection of servers among which a majority 
behave in an honest fashion. We refer to this as a threshold PIR scheme. As we 
show, a threshold PIR scheme is capable of achieving communication overhead 
of 0(1) per consumer request under appropriate computational assumptions. 
As a second relaxation, we consider a scenario in which requests from a large 
number of users may be batched, in which case it is acceptable for servers to 
learn what has been requested, but not by whom. In other words, in consonance 
with the "Crowds" principle, we permit full disclosure of aggregate information, 
but hide information regarding the requests of individual users. We refer to a 
threshold PIR scheme with this latter property as a semi-private information 
retrieval (SPIR) scheme. A SPIR scheme, in addition to achieving communica- 
tion overhead of 0(1), is computationally quite efficient, involving 0(1) basic 
cryptographic operations per item per server. 

2 Preliminaries 
2.1 Building blocks 

We begin by introducing some of the cryptographic primitives used in the more 
sophisticated variants of our protocol. Readers with extensive familiarity of the 
basic cryptographic literature may wish to skip to Section 2.2. 

El Gamal crypto system: A convenient basis for many election schemes, including 
those we employ here, is the El Gamal cryptosystem [?,?]. Encryption takes place 
over a group G q of prime order q. Typically, G q is taken to be a subgroup of /?*, 
where q\p — 1, but alternatives are possible; for example, G q may be the group 
of points of an elliptic curve over a finite field. 1 

Let g be a generator of G q \ this generator is typically regarded as a system 
parameter, since it may correspond to multiple key pairs. The private encryption 
key consists of an integer x Gy Z q , where Gy denotes uniform random selection. 
The corresponding public key is defined to be y = g x . To encrypt a message m G 
G q , we select a Gy, and compute the ciphertext (a,/?) = (my a ,g a ). To decrypt 
this ciphertext using the private key x, we compute a/(3 x = my a /(g a ) x = ra. 

1 Most commonly, we let p = 2q 4- 1, and we let G q be the set of quadratic residues 
in Z*. In this setting, plaintexts not in Q q can be mapped onto Q q by appropriate 
forcing of the LeGendre symbol, e.g., inversion of the associated integer sign. 
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We assume a consistent choice of g as a generator for all instantiations of the El 
Gamal cryptosystem in this paper. 

The El Gamal cryptosystem is semantically secure under the Decision Diffie- 
Hellman assumption over G q [?,?]. Informally, this means that an attacker who 
selects message pair (m 0 ,mi) is unable to distinguish between encryptions of 
these two messages with probability significantly greater than 1/2, (See [?] for 
futher details.) 

Let (aoCKi, /?o/?i) = (#0, A)) <S> (a\,0\). Another useful property of the El 
Gamal cryptosystem is the fact that it possesses a homomorphism under the 
operator <g>. In particular, observe that if (ao»/?o) and (ai,/?i) represent cipher- 
texts corresponding to plaintexts mo and m\ respectively, then (ao, P$)®{ot\, (3\) 
represents an encryption of the plaintext momj. A consequence of this homo- 
morphic property is that it is possible, using knowledge of the public key alone, 
to derive a random re-encryption (a',/?') of a given ciphertext (a,/?). This is 
accomplished by computing (a',/?') = (a,{3) <g> (7, £), where (7, represents an 
encryption of the plaintext value 1. It is possible to prove quite efficiently in 
zero- knowledge that (a' ',/?') represents a valid re-encryption of (a,/?) using a 
variant of the Schnorr proof of knowledge protocol [?]. This proof may also be 
made non-interactive. See [?] for an overview. 

Bulletin Board: Our proposed schemes with multiple players or servers assume 
the availability of a bulletin board. This may be viewed as a piece of memory 
which any player may view and to which all players have appendative write 
access. A bulletin board may be realized as a public broadcast channel, achievable 
through Byzantine agreement or through some appropriate physical assumption. 
See [?,?] for description of a practical implemenation of this primitive. Postings 
to a bulletin board may be made authenticable, i.e., their source may be securely 
validated, through use of such mechanisms as digital signaturess. 

Mix networks: An important building block in one of our protocols is a mix 
network. This is a primitive executed by a collection of servers 5i, 52, ■ ■ • , 5 m 
with an appropriately shared public/private key pair (y, x). Let E y [M] represent 
the encryption under public key y of message M in a probabilistic public-key 
cryptosystem E. A mix network takes as input a vector of ciphertexts V = 
{E y [M\\, E y [M^ . . . , E y [M n ]}. Output from the mix network is the vector V = 
{E y [M a {\)\, E y [M a (2)}, . . . , £? y [M ff (n)]}, where a is a random permutation on n 
elements, A mix scheme is said to be robust if, given a static adversary with active 
control of a minority coalition of servers, V f represents a valid permutation and 
re-encryption of ciphertexts in V with overwhelming probability. A mix scheme 
is said to be private if, given valid output V, for any i G {1,2, ...,n}, an 
adversary with passive control of at most m— 1 servers cannot determine o~~ l (i) 
with probability non-negligibly larger than 1/n. 

Mix servers were introduced by Chaum as a basic primative for privacy. 
In his simple formulation, each server Si takes the output Vi of the previous 
server and simply permutes and re-encrypts the ciphertexts therein. While this 
scheme is private, it is not robust. A robust mix network was introduced by. 
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The most efficient to date is the construction of Jakobsson, which requires 0(n) 
computation and communication per server. The scheme makes use of the El 
Gamal cryptosystem. For further details and formal definitions, the reader is 
referred to [?]. Given its robustness and efficiency on large batches, the Jakobsson 
construction is probably most appropriate for our schemes. Given, however, that 
robustness is not of critical importance in these schemes, even the mix network 
proposed by Chaum may often be appropriate. 

There are many variations on mix networks. For example, V may be a vec- 
tor of ^-tuples of ciphertexts [?]. Additionally, a mix network may take either 
ciphertexts or plaintexts as inputs and likewise output either plaintexts or ci- 
phertexts. We employ a variety of such operations in our protocols, and do not 
describe implementation details. 

Robust threshold El Gamal cryptosystem: A robust (j, m)-threshold cryptosys- 
tem is one in which a private key is held distributively by m players in such a 
way that a ciphertext may only be decrypted by j of them; decryption takes 
place without leakage of information about the private key. (See, e.g., [?] for a 
survey.) The Pedersen protocol [?,?] may be used as a basis for key generation 
in such a scheme for El Gamal, although see [?] for a caveat. For a description 
of a corresponding decryption algorithm, see, e.g., [?]. These algorithms are, in 
a practical sense, quite efficient. 

Distributed ciphertext key transformation (DIKT) This is essentially a variant 
on robust threshold El Gamal decryption. The protocol is executed by servers 
S\ , 52, . . . , S m holding El Gamal secret key x in an appropriate distributed fash- 
ion. For simplicity of presentation, we assume here that x is shared additively 
among the servers. In particular, we assume that server Si holds public/private 
share (x^y* — g Xi ) such that x — Yl^Li x i- (Robustness is achievable by hav- 
ing a second level sharing, i.e., a sharing of shares.) Our protocol can be easily 
extended to more natural sharing schemes, such as threshold schemes based on 
Shamir secret sharing. We omit discussion of relevant threshold key generation 
and management issues, and instead refer the reader to, e.g., [?]. Let y be the 
public key corresponding to x. 

The DIKT protocol takes as input an El Gamal public key y', as well as a 
ciphertext (a t p) = E y [M]. The output of the protocol is (a',/?') = E y >[M], The 
protocol is said to be robust if, given an adversary with active control of a static 
minority coalition of servers, the output of the protocol is correct. It is said to 
be private if such an adversary learns no information about M. In particular, we 
require, given the presence of such an adversary, that M be semantically secure 
under this protocol, in a sense analogous to that for the underlying El Gamal 
cryptosystem. As in the other primitives described in this paper, we assume the 
availability of a bulletin board. 

We propose a protocol in which each server Si does the following. 

1. Si chooses w £y Z q . 

2. Si computes (ji>di) = {0 Xi y /w , g fw ) and posts it to the bulletin board. 
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3. Si posts a proof, relative to her public key yi, that (7i,<$i) is correctly for- 
mulated. 

This last step may be accomplished by having Si prove knowledge of expo- 
nents d\ and d<i such that 7* = p dl y td2 and Si = g ,d2 and y± — x dl . This may be 
done efficiently and in either interactive or non-interactive zero knowledge using 
protocols described in, e.g., [?,?]. 

If any server proof is incorrect, that server is expelled from the coalition, 
and the protocol is performed again with the remaining servers. Once all servers 
have correctly posted their partial computations, the servers jointly compute 
(a', 0') = (a/ J]™ ! 7;, (3/ f^™ 1 ^). The DIKT scheme may be shown to be ro- 
bust and private relative to the Decision Diffie-Hellman assumption. 

Distributed plaintext equality test This is a protocol whereby, given El Gamal 
ciphertexts (a,/?) and (a',/?'), a collection of servers determine whether the 
underlying plaintexts are identical. The protocol may be performed in a robust 
manner, and in such a way that no additional information is revealed. See [?] 
for an efficient protocol construction and proofs of security. We write (a, /?) « 
(a',/3') to denote equality of underlying plaintexts. 

2.2 Model and definitions for our scheme 

Let Ci, C2, • . • , Ck be a collection of consumers toward whom advertisements are 
to be directed. Let Pi, P2, . . . , Pk be the respective profiles of these consumers. 
These profiles may contain any of a variety of pieces of information on the con- 
sumer, including standard demographic information such as age, sex, profession, 
annual income, etc., as well as other information such as recently visited URLs 
and search engine queries. Let us designate the set of possible consumer profiles 
by V. We denote the advertiser by A, and let a = {ai, a2, . . . ,a n } be the set 
of advertisements that A seeks to disseminate. The advertiser chooses a nego- 
tiant function f a : V —* Z n . This function takes the profile of a consumer as 
input and outputs a choice of advertisement from among a. As an example, / 
might derive a list of the most common words in URLs visited by the user and 
seek to match these to descriptors associated with the ads in a. Of course, it is 
possible to include additional inputs to /, such as the current date, or the list 
of advertisements already sent to the consumer. We leave these choices to the 
imagination of the reader. We assume that A is represented by a collection of 
servers Si, 52, . . . , S m who share a bulletin board. All consumers post ad requests 
to the bulletin board. Servers then initiate communication with consumers and 
dispense ads to them. The following is a list of definitions and properties useful 
in describing negociant protocols. 

Let I be an appropriately defined security parameter. We say that a function 
q(l) is negligible if for any polynomial p, there exists a value d such that for I > rf, 
we have q(l) < l/\p(l)\. Otherwise, we say that q is non-negligible. We say that 
a probability function q(l) is overwhelming if 1 — q(l) is negligible. 
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Let Ai be a polynomial-time adversary that actively controls a static mi- 
nority set of servers, or, if there is only one server, controls that single server. 
In other words, let us suppose that A\ controls max([m/2\,l) servers. Con- 
sider the following experiment. Let us assume without loss of generality that A\ 
does not control consumer C\. A\ chooses a pair of profiles Pq,P\ € V. A bit 
b € {0, 1} is selected at random and Pi is set to Pf,. Now the protocol is run. 
We say that the protocol has full privacy if for any adversary A\, it is the case 
that pr[A\ outputs b] — 1/2 is negligible, where the probability is taken over the 
coin flips of all participants. This definition states informally that the protocol 
transcript reveals no significant information about Pi . 

Now let us modify the experiment slightly and consider a polynomial-time 
algorithm A 2 identical to A\, except that A 2 is provided only with the input 
/(Pi)j f(Pz)t • ■ • > f{Pn) m a random order. We say that a negociant protocol has 
profile privacy if for any A\ and any P 2 , P3, . . . , P n , there exists an A 2 such that 
pr[A\ outputs b) - pr[A 2 outputs b] is negligible if all consumers adhere to the 
correct protocol. Again, probabilities are taken over the coin flips of all partici- 
pants. In other words, the protocol transcript reveals no signficant information 
about Pi other than that revealed by the aggregate ad requests of the partici- 
pating consumers. Note that when m = 1, the property of profile privacy means 
that an advertiser learns only the ad requests of a consumer. When m > 1, the 
property implies that an advertiser learns only the agregate ad requests of a 
group of consumers. 

We say that a negociant protocol is aggregate transparent if any server can 
determine /(Pi), /(P2), ■ • ■ , f{P n ) with overwhelming probability. In real-world 
advertising scenarios, it is important that a protocol be aggregate transparent, 
as the clients of advertisers wish to konw how many times their ads have been 
displayed. 

3 Some Negociant Schemes 

We now present a small spectrum of negociant schemes with different properties 
and resource costs. 

3.1 Naive PIR scheme 

We present this simple scheme as a conceptual introduction. Here, requests are 
directed from a single consumer C with profile P to a single server S. (Thus the 
scheme may be modeled by m = n — 1.) The scheme is this: The server sends 
all of a to C, who then views ad &/(p). 

Clearly, this scheme enjoys full privacy. The chief drawback is the 0(n) com- 
munication cost. Another drawback is the fact that the scheme is not aggregate 
transparent. Nonetheless, given a limited base of advertisements and good band- 
width, and if advertisers are satisfied with recording click-through rates, this 
scheme may be useable in certain practical scenarios. 
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3.2 Direct request scheme 

This is another conceptually simple scheme involving a one-on-one consumer and 
server interaction. In this scheme, C simply sends f(P) to 5, who returns Q>f(p)* 
This scheme enjoys profile privacy and has communication and computation 
costs 0(1). Despite (or because of) its simplicity, it is from a practical standpoint 
perhaps the most appealing of the schemes proposed here. 

3.3 SPIR scheme 

We now show how to invoke some of the cryptographic apparatus described 
above in order to achieve a semi-private information retrival (SPIR) scheme. 
Given database a — {ai,a2, . - • ,a n }, the goal is for a collection of consumers 
Ci, C2, . . . , Ck to retrieve respective elements {a n } in such a way that the 
database servers learn requests only in the aggregate. Of course, our aim here is 
to apply this scheme to the retrieval of advertisements, and we shall present it 
in this context. In other words, we assume that = f(Pi), i.e., users are con- 
sumers seeking to retrieve advertisements. As above, we assume a public/private 
El Gamal key pair (y, x) held in an appropriate distributed manner by servers 
Si, S2, ■ • • , S m . We also assume that each consumer d has a public/private El 
Gamal key pair (ycn^d)- The scheme is as follows. 

1. Each consumer Ci computes = f{P%) and posts the pair (E y [ri] } i) to the 
bulletin board. Let V\ = {^[^i], i}?=i ^ e a vector of ciphertext/plaintext 
pairs accumulated when all consumers have posted their requests. 

2. Servers apply a mix network to V\ to obtain V2, where V2 is a vector of pairs 
{(r a (i), E y [ai(i)])}^ x for random, unknown permutation a\. 

3. Servers replace each integer rj in V2 with a rj . Call the resulting vector V 2 f . 

4. Servers apply a mix network to V{ to obtain a vector V3, where V3 is a vector 
of pairs {(^yK ff2(i) ]>0}i fc =i) an d a 2 is an unknown random permutation. 

5. Let (E y [aj],i) be an element in V3. For each pair, the servers apply DIKT 
to obtain (E yc [aj],i). Let the resulting vector be V4, 

6. For each element (E yc . [aj],i) in V4, servers send E yc . [aj] to Q. 

7. Consumers decrypt their respective ciphertexts. 

It may be shown that this is a SPIR scheme, i.e., enjoys profile privacy, relative to 
the Decision Diffie-Hellman assumption. Assuming that a public key operation 
in G q incurs cost <9(/ 3 ), where it will be recalled that I is a security parameter, 
the computational costs of the scheme are ©(I 3 ) per element per server. The 
communication costs of the scheme are 0(1). With some efficiency enhancements, 
we believe that this scheme may be implemented in a fairly practical manner. 

Bulk encryption We assume here that an advertisement may be represented as a 
single ciphertext. Of course, in reality, it is impractical to use ads small enough or 
a group G q large enough to support this assumption. We may, however, represent 
an advertisement as a sequence of associated ciphertexts. Alternatively, we may 
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use an enveloping scheme, and represent a encryption of M as E y [M] = (7,(5), 
where 7 = {E v [ki], £ v [k 2 ], . . . , E v [k z ]} and S = c« a e« z _,ci[M]. Here, e K [M] 
represents a symmetric-key encryption of M, where k G K is a key from keyspace 
/f. To re-encrypt 2S y [M] as (7', 5'), a player does the following: 

1. Re-encrypt all ciphertexts in 7. 

2. Select k z +\ £y K. 

3. Append E y [K z +i] to 7 to obtain 7'. 

4. Compute J' as e Kz+l [6]. 

There are two major drawbacks to this scheme. First, the size of a ciphertext, 
as well as the computational cost of re-encryption, grows linearly in the number 
of re-encryptions z. While this leads to poor asymptotic performance, it may 
substantially enhance the performance in practice, particularly when m is small 
and ad sizes are large. A second drawback is that it is unclear how to achieve 
robustness in an efficient way in a mix network employing such a scheme. We 
submit, however, that robustness is a less important consideration than privacy 
in our schemes. 

Reducing public-key operation costs It should be noted that all of the costly 
operations in our scheme involve exponentiations in G q . These may be made 
quite inexpensive through the use of pre-processing or addition chains. Thus, 
assuming 100,000 elements, the total cost is roughly that of... 

3.4 Threshold PIR 

The SPIR scheme described above can be strengthened with a few extra steps, 
and at the expense of additional computational overhead, into a threshold PIR 
scheme. The idea is to perform a blind lookup of consumer ad requests. This is 
accomplished by mixing ads and then invoking the distributed plaintext equality 
test described in Section 2.1. We present the protocol here. 

1. Each consumer d computes = f(Pi) and posts the pair (E y [ri],i) to the 
bulletin board. Let V\ — {E y [ri}, 1)1-1 be a vector of ciphertext/plaintext 
pairs accumulated when all consumers have posted their requests. 

2. For each pair (E y [ri},i), servers do the following: 

(a) Servers construct a vector U\ of pairs {j,E y [a,j}). 

(b) Servers mix U\ to obtain a vector U2 in which an entry takes the form 

(EyljlEyldj)). 

(c) For each j, until a match is found, servers perform a distributed plaintext 
equality test to see whether E y [j] ^ E y [ri], 

(d) When a match is found, servers replace J5 y [ri] with E y [o,j] = E y [a ri ]. 

3. Let V2 be the resulting vector, and let (E y [a ri \, i) be an element therein. 
For each such pair, the servers apply DIKT to obtain {E yc- [a,j],i). Let the 
resulting vector be V3. 

4. For each element {E yc \aj],i) in V3, servers send E yc \aj) to C». 

5. Consumers decrypt their respective ciphertexts. 



